Skip to content

Fix #1048: stop reporting 100% host CPU on SQL Server on Linux#1049

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/1048-linux-host-cpu
Jun 2, 2026
Merged

Fix #1048: stop reporting 100% host CPU on SQL Server on Linux#1049
erikdarlingdata merged 1 commit into
devfrom
feature/1048-linux-host-cpu

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Problem

Reported in #1048: on SQL Server 2019 on Linux, the Dashboard's host CPU reads 100% constantly while the host is barely busy. SQL Server's own CPU number is correct — only the host/other-process figure is wrong.

Root cause (web-confirmed)

The CPU collector reads RING_BUFFER_SCHEDULER_MONITOR and derives other_process = 100 - SystemIdle - ProcessUtilization. On Linux, SystemIdle is always 0 — a documented platform limitation. Microsoft's own sample query carries the literal comment -- SystemIdle on Linux will be 0. So other = 100 - 0 - sqlcpu, host total = 100 - SystemIdle = 100% forever. No DMV exposes true host CPU on Linux.

Fix

Detect Linux (sys.dm_os_host_info.host_platform, via sp_executesql so SQL 2016 never binds the 2017+ DMV) and store NULL for other_process_cpu_utilization on Linux instead of fabricating it. Every consumer degrades to the correct SQL-only figure.

File Change
install/18_collect_cpu_utilization_stats.sql Linux detection; store NULL for other-process on Linux
Lite/Services/RemoteCollectorService.Cpu.cs Same detection in the ring-buffer query; append NULL to DuckDB
install/02_create_tables.sql other_process_cpu_utilization → nullable (fresh installs); NULL propagates to the total_cpu_utilization computed column
upgrades/2.11.0-to-2.12.0/03_make_other_process_cpu_nullable.sql Migrate existing tables: drop computed col → widen base col to NULL → re-add computed col (in-place ALTER is blocked by the dependency). Idempotent + partial-failure safe
install/47_create_reporting_views.sql cpu_spikes + daily_summary coalesce total → ISNULL(total, sqlserver) so high-CPU is still detected via SQL CPU on Linux
Dashboard/Services/DatabaseService.ResourceMetrics.cs, …/Overview.cs Same ISNULL coalesce in the chart query, MCP query, and daily-summary predicate

All other CPU consumers (fact collectors, anomaly/baseline/drilldown, FinOps) already key on sqlserver_cpu_utilization or coalesce NULLs to 0, so they degrade consistently. The installer re-runs install/* on upgrade, so the collector/view edits reach existing installs automatically.

Testing

Validated on sql2022 (live):

  • ✅ Actual upgrade file runs idempotently (re-run = no-op), existing data preserved, correct final shape
  • NULL other-process propagates to NULL total through the computed column
  • ✅ Linux-detection returns 0 on Windows → Windows behavior unchanged
  • ✅ Both apps build clean (0 warnings / 0 errors)

Not tested: the actual Linux NULL path — all available SQL instances (2016–2025) are Windows and the behavior can't be faked without the Linux DMV. The logic (host_platform = 'Linux' → NULL) is straightforward; a real Linux instance is the only true confirmation.

Scope note: uses NULL + SQL-only fallback rather than an explicit "host CPU N/A on Linux" chart badge (WPF work, not visually verifiable in this environment). On Linux the Dashboard CPU chart's "Total" line sits on top of the "SQL" line.

🤖 Generated with Claude Code

On SQL Server on Linux the SCHEDULER_MONITOR ring buffer reports
SystemIdle = 0 (a documented platform limitation — Microsoft's own sample
query carries the comment "SystemIdle on Linux will be 0"). The CPU
collector derives other_process_cpu_utilization as
100 - SystemIdle - ProcessUtilization, so on Linux that becomes
100 - 0 - sqlcpu and the host total (sqlserver + other) pins at 100%
forever. SQL Server's own CPU number is correct; only the host/other
figure is fabricated, and no DMV exposes true host CPU on Linux.

Fix: detect Linux (sys.dm_os_host_info.host_platform, via sp_executesql so
SQL 2016 never binds the 2017+ DMV) and store NULL for
other_process_cpu_utilization on Linux instead of a false value. Every
consumer then degrades to the correct SQL-only figure:

- install/18 + Lite RemoteCollectorService.Cpu: store NULL on Linux.
- install/02: other_process_cpu_utilization made nullable; NULL propagates
  to the total_cpu_utilization computed column.
- upgrades/2.11.0-to-2.12.0/03: migrate existing tables — drop the persisted
  computed column, widen the base column to NULL, re-add the computed column
  (ALTER COLUMN is blocked while the computed column depends on it).
  Idempotent and partial-failure safe.
- install/47 views + Dashboard ResourceMetrics/Overview reads: coalesce
  total to ISNULL(total, sqlserver) so charts, MCP, and high-CPU detection
  fall back to SQL CPU rather than NULL/0/100 on Linux.

Windows behavior is unchanged (host_platform != Linux). Fact collectors,
anomaly/baseline/drilldown, and FinOps already key on sqlserver_cpu_utilization
or coalesce NULLs to 0, so they degrade consistently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 9e5812a into dev Jun 2, 2026
6 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/1048-linux-host-cpu branch June 2, 2026 18:56
MisterZeus pushed a commit to MisterZeus/PerformanceMonitor that referenced this pull request Jun 5, 2026
… on Linux

The erikdarlingdata#1049 fix corrected every path that reads CPU from the collected
table (collector, views, chart reads), but the alert engine doesn't read
that table — DatabaseService.NocHealth.GetCpuPercentAsync runs its own
live query against sys.dm_os_ring_buffers and was never touched. It
computes other_cpu_percent = 100 - SystemIdle - ProcessUtilization, and
since SystemIdle is always 0 on SQL Server on Linux, that returns
100 - sqlcpu. AlertHealthResult.TotalCpuPercent then sums to a permanent
100%, so AlertStateService's TotalCpuPercent >= CpuThresholdPercent check
fires the host-CPU alert forever — exactly what the reporter still saw
after installing the nightly.

Fix: apply the same Linux guard used by install/18, RemoteCollectorService.Cpu,
and FinOps.Inventory — detect host_platform via sp_executesql behind an
OBJECT_ID(N'sys.dm_os_host_info', N'V') check (so SQL 2016 never binds the
2017+ DMV) and return NULL for other_cpu_percent on Linux. The existing
TotalCpuPercent getter already falls back to the SQL-only figure when
OtherCpuPercent is null, so the alert clears. Windows behavior is unchanged.

Dashboard-only change — no schema or installer impact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant